Language Independent Ranked Retrieval with NeWT

نویسندگان

  • J. Shane Culpepper
  • Michiko Yasukawa
  • Falk Scholer
چکیده

In this paper, we present a novel approach to language independent, ranked document retrieval using our new self-index search engine, Newt. To our knowledge, this is the first experimental study of ranked self-indexing for multilingual Information Retrieval tasks. We evaluate the query effectiveness of our indexes using Japanese and English. We explore the impact that linguistic processing, stemming and stopping have on our character-aligned indexes, and present advantages and challenges discovered during our initial evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RMIT and Gunma University at NTCIR-9 Intent Task

In this report, we describe our experimental results for the NTCIR-9 intent task. For our experiments, we use our experimental search engine, Newt. Newt is a ranked selfindex capable of supporting multiple languages by deferring linguistic decisions until query time. To our knowledge, this is the first Information Retrieval task on the ClueWeb09-JA collection performed entirely with ranked self...

متن کامل

RMIT at TREC 2011 Microblog Track

This paper describes our submission to the TREC 2011 microblog task. For the experiments, we use our new self-index search engine, NeWT, to support ranked search in the Twitter document corpus. We use a combination of phrase queries and degrading conjunctive Boolean intersection to improve retrieval effectiveness. Keywords-self-index; full-text search, phrases, threshold; intersection

متن کامل

CIRQuL - Complex Information Retrieval Query Language

In this paper we will present a new framework for the retrieval of XML documents. We will describe the extension for existing query languages (XPath and XQuery) geared toward ranked information retrieval and full-text search in XML documents. Furthermore we will present language models for ranked information retrieval applied to the XML and describe the ultimate goal of our research.

متن کامل

A Wikipedia-Based Multilingual Retrieval Model

This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document di chosen from the “L-subset” of Wikipedia. Likew...

متن کامل

Ad Hoc Retrieval of Documents with Topical Opinion

With a growing amount of subjective content distributed across the Web, there is a need for a domain-independent information retrieval system that would support ad hoc retrieval of documents expressing opinions on a specific topic of the user’s query. In this paper we present a lightweight method for ad hoc retrieval of documents which contain subjective content on the topic of the query. Docum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011